Stochastic sampling of the RNA structural alignment space
نویسندگان
چکیده
A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.
منابع مشابه
RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences
MOTIVATION RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary stru...
متن کاملPair hidden Markov models on tree structures
MOTIVATION Computationally identifying non-coding RNA regions on the genome has much scope for investigation and is essentially harder than gene-finding problems for protein-coding regions. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignments of RNA sequences. On the other hand, Hidden Markov Models...
متن کاملRNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment
MOTIVATION Non-coding RNA genes and RNA structural regulatory motifs play important roles in gene regulation and other cellular functions. They are often characterized by specific secondary structures that are critical to their functions and are often conserved in phylogenetically or functionally related sequences. Predicting common RNA secondary structures in multiple unaligned sequences remai...
متن کاملMurlet: a practical multiple alignment tool for structural RNA sequences
MOTIVATION Structural RNA genes exhibit unique evolutionary patterns that are designed to conserve their secondary structures; these patterns should be taken into account while constructing accurate multiple alignments of RNA genes. The Sankoff algorithm is a natural alignment algorithm that includes the effect of base-pair covariation in the alignment model. However, the extremely high computa...
متن کاملDetecting conserved secondary structures in RNA molecules using constrained structural alignment
Constrained sequence alignment has been studied extensively in the past. Different forms of constraints have been investigated, where a constraint can be a subsequence, a regular expression, or a probability matrix of symbols and positions. However, constrained structural alignment has been investigated to a much lesser extent. In this paper, we present an efficient method for constrained struc...
متن کامل